{
 "metadata": {
  "name": "",
  "signature": "sha256:3e182762cb13047048d97db09b34a646a983cb94c8f2d94e55d8d9342ddf1714"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<img src=http://continuum.io/media/img/continuum_analytics_logo.png align=\"right\" width=\"30%\">\n",
      "\n",
      "# Blaze - A Quick Tour\n",
      "\n",
      "Blaze provides a lightweight interface on top of pre-existing computational infrastructure.  This notebook gives a quick overview of how Blaze interacts with a variety of data types."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blaze import Data, by, compute"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Blaze wraps pre-existing data\n",
      "\n",
      "Blaze interacts with normal Python objects.  Operations on Blaze `Data` objects create expression trees.  \n",
      "\n",
      "These expressions deliver an intuitive numpy/pandas-like feel."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(1)\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "1"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "1"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "dshape(\"int64\")"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x + 1"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "2"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "2"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print type(x + 1)\n",
      "print type(compute(x + 1))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "<class 'blaze.expr.arithmetic.Add'>\n",
        "<type 'int'>\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Lists\n",
      "\n",
      "Starting small, Blaze interacts happily with collections of data.  \n",
      "\n",
      "It uses Pandas for pretty notebook printing."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data([1, 2, 3, 4, 5])\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>_2</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 5</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "   _2\n",
        "0   1\n",
        "1   2\n",
        "2   3\n",
        "3   4\n",
        "4   5"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x[x > 2] * 10"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>_2</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 30</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 40</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 50</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "   _2\n",
        "0  30\n",
        "1  40\n",
        "2  50"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "dshape(\"5 * int64\")"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Or Tabular, Pandas-like datasets\n",
      "\n",
      "Slightly more exciting, Blaze operates on tabular data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "L = [[1, 'Alice',   100],\n",
      "     [2, 'Bob',    -200],\n",
      "     [3, 'Charlie', 300],\n",
      "     [4, 'Dennis',  400],\n",
      "     [5, 'Edith',  -500]]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(L, fields=['id', 'name', 'amount'])\n",
      "x.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "dshape(\"5 * {id: int64, name: string, amount: int64}\")"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>name</th>\n",
        "      <th>amount</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td>   Alice</td>\n",
        "      <td> 100</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td>     Bob</td>\n",
        "      <td>-200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "      <td> Charlie</td>\n",
        "      <td> 300</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 4</td>\n",
        "      <td>  Dennis</td>\n",
        "      <td> 400</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 5</td>\n",
        "      <td>   Edith</td>\n",
        "      <td>-500</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "   id     name  amount\n",
        "0   1    Alice     100\n",
        "1   2      Bob    -200\n",
        "2   3  Charlie     300\n",
        "3   4   Dennis     400\n",
        "4   5    Edith    -500"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "deadbeats = x[x.amount < 0].name\n",
      "deadbeats"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>   Bob</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> Edith</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "    name\n",
        "0    Bob\n",
        "1  Edith"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Or it can even just drive pandas\n",
      "\n",
      "Blaze doesn't do work, it just tells other systems to do work.\n",
      "\n",
      "In the previous example, Blaze told Python which for-loops to write.  In this example, it calls the right functions in Pandas.  \n",
      "\n",
      "The user experience is identical, only performance differs."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from pandas import DataFrame\n",
      "\n",
      "df = DataFrame([[1, 'Alice',   100],                         \n",
      "                [2, 'Bob',    -200],\n",
      "                [3, 'Charlie', 300],\n",
      "                [4, 'Denis',   400],\n",
      "                [5, 'Edith',  -500]], columns=['id', 'name', 'amount'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>name</th>\n",
        "      <th>amount</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td>   Alice</td>\n",
        "      <td> 100</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td>     Bob</td>\n",
        "      <td>-200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "      <td> Charlie</td>\n",
        "      <td> 300</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 4</td>\n",
        "      <td>   Denis</td>\n",
        "      <td> 400</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 5</td>\n",
        "      <td>   Edith</td>\n",
        "      <td>-500</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 14,
       "text": [
        "   id     name  amount\n",
        "0   1    Alice     100\n",
        "1   2      Bob    -200\n",
        "2   3  Charlie     300\n",
        "3   4    Denis     400\n",
        "4   5    Edith    -500"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(df)\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>name</th>\n",
        "      <th>amount</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td>   Alice</td>\n",
        "      <td> 100</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td>     Bob</td>\n",
        "      <td>-200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "      <td> Charlie</td>\n",
        "      <td> 300</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 4</td>\n",
        "      <td>   Denis</td>\n",
        "      <td> 400</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 5</td>\n",
        "      <td>   Edith</td>\n",
        "      <td>-500</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "   id     name  amount\n",
        "0   1    Alice     100\n",
        "1   2      Bob    -200\n",
        "2   3  Charlie     300\n",
        "3   4    Denis     400\n",
        "4   5    Edith    -500"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "deadbeats = x[x.amount < 0].name\n",
      "deadbeats"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>   Bob</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> Edith</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "    name\n",
        "1    Bob\n",
        "4  Edith"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Calling `compute`, we see that Blaze returns a thing like what it was given."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(compute(deadbeats))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "pandas.core.series.Series"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Other data types like SQLAlchemy Tables\n",
      "\n",
      "Blaze extends beyond just Python and Pandas (that's the main motivation.)  \n",
      "\n",
      "Here it drives SQLAlchemy."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sqlalchemy import Table, Column, MetaData, Integer, String, create_engine\n",
      "\n",
      "tab = Table('bank', MetaData(),\n",
      "            Column('id', Integer),\n",
      "            Column('name', String),\n",
      "            Column('amount', Integer))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(tab)\n",
      "x.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "dshape(\"var * {id: ?int32, name: ?string, amount: ?int32}\")"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Just like computations on pandas objects produce pandas objects, computations on SQLAlchemy tables produce SQLAlchemy Select statements.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "deadbeats = x[x.amount < 0].name\n",
      "compute(deadbeats)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "<sqlalchemy.sql.selectable.Select at 0x7f2543f2fc10; Select object>"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print compute(deadbeats)  # SQLAlchemy generates actual SQL"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "SELECT bank.name \n",
        "FROM bank \n",
        "WHERE bank.amount < :amount_1\n"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Connect to a real database\n",
      "\n",
      "When we drive a SQLAlchemy table connected to a database we get actual computation."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "engine = create_engine('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(engine)\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "Data:       Engine(sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db)<br>DataShape:  {<br>  iris: var * {<br>    sepal_length: ?float64,<br>    sepal_width: ?float64,<br>    petal_length: ?float64,<br>    petal_width: ?float64,<br>    species: ?string<br>  ..."
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "Data:       Engine(sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db)\n",
        "DataShape:  {\n",
        "  iris: var * {\n",
        "    sepal_length: ?float64,\n",
        "    sepal_width: ?float64,\n",
        "    petal_length: ?float64,\n",
        "    petal_width: ?float64,\n",
        "    species: ?string\n",
        "  ..."
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x.iris"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>sepal_length</th>\n",
        "      <th>sepal_width</th>\n",
        "      <th>petal_length</th>\n",
        "      <th>petal_width</th>\n",
        "      <th>species</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0 </th>\n",
        "      <td> 5.1</td>\n",
        "      <td> 3.5</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1 </th>\n",
        "      <td> 4.9</td>\n",
        "      <td> 3.0</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2 </th>\n",
        "      <td> 4.7</td>\n",
        "      <td> 3.2</td>\n",
        "      <td> 1.3</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3 </th>\n",
        "      <td> 4.6</td>\n",
        "      <td> 3.1</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4 </th>\n",
        "      <td> 5.0</td>\n",
        "      <td> 3.6</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5 </th>\n",
        "      <td> 5.4</td>\n",
        "      <td> 3.9</td>\n",
        "      <td> 1.7</td>\n",
        "      <td> 0.4</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6 </th>\n",
        "      <td> 4.6</td>\n",
        "      <td> 3.4</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.3</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7 </th>\n",
        "      <td> 5.0</td>\n",
        "      <td> 3.4</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8 </th>\n",
        "      <td> 4.4</td>\n",
        "      <td> 2.9</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9 </th>\n",
        "      <td> 4.9</td>\n",
        "      <td> 3.1</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.1</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>10</th>\n",
        "      <td> 5.4</td>\n",
        "      <td> 3.7</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 24,
       "text": [
        "    sepal_length  sepal_width  petal_length  petal_width      species\n",
        "0            5.1          3.5           1.4          0.2  Iris-setosa\n",
        "1            4.9          3.0           1.4          0.2  Iris-setosa\n",
        "2            4.7          3.2           1.3          0.2  Iris-setosa\n",
        "3            4.6          3.1           1.5          0.2  Iris-setosa\n",
        "4            5.0          3.6           1.4          0.2  Iris-setosa\n",
        "5            5.4          3.9           1.7          0.4  Iris-setosa\n",
        "6            4.6          3.4           1.4          0.3  Iris-setosa\n",
        "7            5.0          3.4           1.5          0.2  Iris-setosa\n",
        "8            4.4          2.9           1.4          0.2  Iris-setosa\n",
        "9            4.9          3.1           1.5          0.1  Iris-setosa\n",
        "..."
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "by(x.iris.species, shortest=x.iris.sepal_length.min(), \n",
      "                    longest=x.iris.sepal_length.max())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>species</th>\n",
        "      <th>longest</th>\n",
        "      <th>shortest</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>     Iris-setosa</td>\n",
        "      <td> 5.8</td>\n",
        "      <td> 4.3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> Iris-versicolor</td>\n",
        "      <td> 7.0</td>\n",
        "      <td> 4.9</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>  Iris-virginica</td>\n",
        "      <td> 7.9</td>\n",
        "      <td> 4.9</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 25,
       "text": [
        "           species  longest  shortest\n",
        "0      Iris-setosa      5.8       4.3\n",
        "1  Iris-versicolor      7.0       4.9\n",
        "2   Iris-virginica      7.9       4.9"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Use URI strings to ease access\n",
      "\n",
      "Often just figuring out how to produce the relevant Python object can be a challenge.\n",
      "\n",
      "Blaze supports many formats of URI strings"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data('sqlite:////home/mrocklin/workspace/blaze/blaze/examples/data/iris.db::iris')\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>sepal_length</th>\n",
        "      <th>sepal_width</th>\n",
        "      <th>petal_length</th>\n",
        "      <th>petal_width</th>\n",
        "      <th>species</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0 </th>\n",
        "      <td> 5.1</td>\n",
        "      <td> 3.5</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1 </th>\n",
        "      <td> 4.9</td>\n",
        "      <td> 3.0</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2 </th>\n",
        "      <td> 4.7</td>\n",
        "      <td> 3.2</td>\n",
        "      <td> 1.3</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3 </th>\n",
        "      <td> 4.6</td>\n",
        "      <td> 3.1</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4 </th>\n",
        "      <td> 5.0</td>\n",
        "      <td> 3.6</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5 </th>\n",
        "      <td> 5.4</td>\n",
        "      <td> 3.9</td>\n",
        "      <td> 1.7</td>\n",
        "      <td> 0.4</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6 </th>\n",
        "      <td> 4.6</td>\n",
        "      <td> 3.4</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.3</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7 </th>\n",
        "      <td> 5.0</td>\n",
        "      <td> 3.4</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8 </th>\n",
        "      <td> 4.4</td>\n",
        "      <td> 2.9</td>\n",
        "      <td> 1.4</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9 </th>\n",
        "      <td> 4.9</td>\n",
        "      <td> 3.1</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.1</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>10</th>\n",
        "      <td> 5.4</td>\n",
        "      <td> 3.7</td>\n",
        "      <td> 1.5</td>\n",
        "      <td> 0.2</td>\n",
        "      <td> Iris-setosa</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 26,
       "text": [
        "    sepal_length  sepal_width  petal_length  petal_width      species\n",
        "0            5.1          3.5           1.4          0.2  Iris-setosa\n",
        "1            4.9          3.0           1.4          0.2  Iris-setosa\n",
        "2            4.7          3.2           1.3          0.2  Iris-setosa\n",
        "3            4.6          3.1           1.5          0.2  Iris-setosa\n",
        "4            5.0          3.6           1.4          0.2  Iris-setosa\n",
        "5            5.4          3.9           1.7          0.4  Iris-setosa\n",
        "6            4.6          3.4           1.4          0.3  Iris-setosa\n",
        "7            5.0          3.4           1.5          0.2  Iris-setosa\n",
        "8            4.4          2.9           1.4          0.2  Iris-setosa\n",
        "9            4.9          3.1           1.5          0.1  Iris-setosa\n",
        "..."
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Once you have SQL, might as well go big"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data('impala://ec2-54-90-201-28.compute-1.amazonaws.com')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 27
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### MongoDB\n",
      "\n",
      "Github's database is mirrored in a Mongo collection hosted in the Netherlands.\n",
      "\n",
      "Connecting via ssh tunnel.  See http://ghtorrent.org/ to obtain access."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "users = Data('mongodb://ghtorrentro:ghtorrentro@localhost/github::users')\n",
      "users"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>avatar_url</th>\n",
        "      <th>bio</th>\n",
        "      <th>blog</th>\n",
        "      <th>company</th>\n",
        "      <th>created_at</th>\n",
        "      <th>email</th>\n",
        "      <th>followers</th>\n",
        "      <th>following</th>\n",
        "      <th>gravatar_id</th>\n",
        "      <th>hireable</th>\n",
        "      <th>html_url</th>\n",
        "      <th>id</th>\n",
        "      <th>location</th>\n",
        "      <th>login</th>\n",
        "      <th>name</th>\n",
        "      <th>public_gists</th>\n",
        "      <th>public_repos</th>\n",
        "      <th>type</th>\n",
        "      <th>url</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/a7e55f31bb4...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td>                                         None</td>\n",
        "      <td>                 None</td>\n",
        "      <td> 2012-05-04T13:59:54Z</td>\n",
        "      <td>                     None</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td> a7e55f31bb45321f30211e901cd89ffa</td>\n",
        "      <td>  None</td>\n",
        "      <td> https://github.com/Michaelwussler</td>\n",
        "      <td> 1706010</td>\n",
        "      <td>               None</td>\n",
        "      <td> Michaelwussler</td>\n",
        "      <td>                 None</td>\n",
        "      <td>   0</td>\n",
        "      <td>   3</td>\n",
        "      <td> User</td>\n",
        "      <td> https://api.github.com/users/Michaelwussler</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/eb8139078bc...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td>                                         None</td>\n",
        "      <td>                 None</td>\n",
        "      <td> 2012-05-03T18:47:13Z</td>\n",
        "      <td>                     None</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td> eb8139078bc623dee103ed3917c080dc</td>\n",
        "      <td>  None</td>\n",
        "      <td>        https://github.com/praiser</td>\n",
        "      <td> 1703505</td>\n",
        "      <td>               None</td>\n",
        "      <td>        praiser</td>\n",
        "      <td>                 None</td>\n",
        "      <td>   0</td>\n",
        "      <td>   3</td>\n",
        "      <td> User</td>\n",
        "      <td>        https://api.github.com/users/praiser</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/13c7b665e0c...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td>                                             </td>\n",
        "      <td>                     </td>\n",
        "      <td> 2010-04-07T12:15:00Z</td>\n",
        "      <td>     vad.viktor@gmail.com</td>\n",
        "      <td>   2</td>\n",
        "      <td>   3</td>\n",
        "      <td> 13c7b665e0cbd94e0155387c35957d13</td>\n",
        "      <td> False</td>\n",
        "      <td>      https://github.com/vadviktor</td>\n",
        "      <td>  238703</td>\n",
        "      <td>           Budapest</td>\n",
        "      <td>      vadviktor</td>\n",
        "      <td>           Vad Viktor</td>\n",
        "      <td>   0</td>\n",
        "      <td>  10</td>\n",
        "      <td> User</td>\n",
        "      <td>      https://api.github.com/users/vadviktor</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/b7937805411...</td>\n",
        "      <td>                                                  </td>\n",
        "      <td>                                         None</td>\n",
        "      <td>         Appcelerator</td>\n",
        "      <td> 2012-04-02T16:13:58Z</td>\n",
        "      <td>    yjin@appcelerator.com</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td> b7937805411d278ceb839175e251e2a0</td>\n",
        "      <td> False</td>\n",
        "      <td>          https://github.com/ypjin</td>\n",
        "      <td> 1598831</td>\n",
        "      <td>            Beijing</td>\n",
        "      <td>          ypjin</td>\n",
        "      <td>               Yuping</td>\n",
        "      <td>   0</td>\n",
        "      <td>   5</td>\n",
        "      <td> User</td>\n",
        "      <td>          https://api.github.com/users/ypjin</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/89e109fca84...</td>\n",
        "      <td>                                                  </td>\n",
        "      <td>  http://blogs.perl.org/users/steven_haryanto</td>\n",
        "      <td>                    -</td>\n",
        "      <td> 2010-02-26T01:28:09Z</td>\n",
        "      <td> stevenharyanto@gmail.com</td>\n",
        "      <td>  39</td>\n",
        "      <td> 307</td>\n",
        "      <td> 89e109fca8474e5636c9feef7a8422ea</td>\n",
        "      <td> False</td>\n",
        "      <td>      https://github.com/sharyanto</td>\n",
        "      <td>  211084</td>\n",
        "      <td> Jakarta, Indonesia</td>\n",
        "      <td>      sharyanto</td>\n",
        "      <td>      Steven Haryanto</td>\n",
        "      <td>   5</td>\n",
        "      <td> 195</td>\n",
        "      <td> User</td>\n",
        "      <td>      https://api.github.com/users/sharyanto</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/7490b4e3e9c...</td>\n",
        "      <td> Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ...</td>\n",
        "      <td>                                http://c9s.me</td>\n",
        "      <td>                     </td>\n",
        "      <td> 2009-02-01T15:20:08Z</td>\n",
        "      <td> cornelius.howl@gmail.com</td>\n",
        "      <td> 330</td>\n",
        "      <td> 599</td>\n",
        "      <td> 7490b4e3e9cb85a1f7dc0c8ea01a86e5</td>\n",
        "      <td>  True</td>\n",
        "      <td>            https://github.com/c9s</td>\n",
        "      <td>   50894</td>\n",
        "      <td>     Taipei, Taiwan</td>\n",
        "      <td>            c9s</td>\n",
        "      <td>            Yo-An Lin</td>\n",
        "      <td> 281</td>\n",
        "      <td> 206</td>\n",
        "      <td> User</td>\n",
        "      <td>            https://api.github.com/users/c9s</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/dc078ac4dbd...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td>                            azhari.harahap.us</td>\n",
        "      <td>         CapungRiders</td>\n",
        "      <td> 2010-10-31T05:53:40Z</td>\n",
        "      <td>        azhari@harahap.us</td>\n",
        "      <td>  26</td>\n",
        "      <td>  11</td>\n",
        "      <td> dc078ac4dbdc06d3e3c0ec0b6801b53d</td>\n",
        "      <td> False</td>\n",
        "      <td>      https://github.com/back2arie</td>\n",
        "      <td>  461397</td>\n",
        "      <td>          Indonesia</td>\n",
        "      <td>      back2arie</td>\n",
        "      <td>       Azhari Harahap</td>\n",
        "      <td>   1</td>\n",
        "      <td>  15</td>\n",
        "      <td> User</td>\n",
        "      <td>      https://api.github.com/users/back2arie</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/fb844ffed6c...</td>\n",
        "      <td> Git Ninja and language-agnostic problem solver...</td>\n",
        "      <td>                           http://dukeleto.pl</td>\n",
        "      <td>        Leto Labs LLC</td>\n",
        "      <td> 2008-10-22T03:02:15Z</td>\n",
        "      <td>        jonathan@leto.net</td>\n",
        "      <td> 175</td>\n",
        "      <td> 635</td>\n",
        "      <td> fb844ffed6c5a2e69638627e3b721308</td>\n",
        "      <td>  True</td>\n",
        "      <td>           https://github.com/leto</td>\n",
        "      <td>   30298</td>\n",
        "      <td>       Portland, OR</td>\n",
        "      <td>           leto</td>\n",
        "      <td> Jonathan \"Duke\" Leto</td>\n",
        "      <td> 276</td>\n",
        "      <td> 112</td>\n",
        "      <td> User</td>\n",
        "      <td>           https://api.github.com/users/leto</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/3843ec7861e...</td>\n",
        "      <td>                                                  </td>\n",
        "      <td>                       http://alanhaggai.org/</td>\n",
        "      <td>      Thought Ripples</td>\n",
        "      <td> 2009-01-13T16:25:15Z</td>\n",
        "      <td>          haggai@cpan.org</td>\n",
        "      <td>  46</td>\n",
        "      <td> 365</td>\n",
        "      <td> 3843ec7861e271e803ea076035d683dd</td>\n",
        "      <td> False</td>\n",
        "      <td>     https://github.com/alanhaggai</td>\n",
        "      <td>   46288</td>\n",
        "      <td>                 IN</td>\n",
        "      <td>     alanhaggai</td>\n",
        "      <td>    Alan Haggai Alavi</td>\n",
        "      <td>   4</td>\n",
        "      <td>  54</td>\n",
        "      <td> User</td>\n",
        "      <td>     https://api.github.com/users/alanhaggai</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9 </th>\n",
        "      <td> https://secure.gravatar.com/avatar/f611628c558...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td>                               arisdottle.net</td>\n",
        "      <td> Team Rooster Pirates</td>\n",
        "      <td> 2009-05-12T19:29:09Z</td>\n",
        "      <td> amiri@roosterpirates.com</td>\n",
        "      <td>  16</td>\n",
        "      <td>  87</td>\n",
        "      <td> f611628c5588f7a0a72c65ec1f94dfb8</td>\n",
        "      <td> False</td>\n",
        "      <td>          https://github.com/amiri</td>\n",
        "      <td>   83806</td>\n",
        "      <td>    Los Angeles, CA</td>\n",
        "      <td>          amiri</td>\n",
        "      <td>      Amiri Barksdale</td>\n",
        "      <td>  16</td>\n",
        "      <td>  18</td>\n",
        "      <td> User</td>\n",
        "      <td>          https://api.github.com/users/amiri</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>10</th>\n",
        "      <td> https://secure.gravatar.com/avatar/c57483c5cfe...</td>\n",
        "      <td>                                              None</td>\n",
        "      <td> http://www.geekfarm.org/wu/muse/WebHome.html</td>\n",
        "      <td>                 None</td>\n",
        "      <td> 2009-02-08T03:28:54Z</td>\n",
        "      <td>       git-c@geekfarm.org</td>\n",
        "      <td>  16</td>\n",
        "      <td>  87</td>\n",
        "      <td> c57483c5cfe159b98a6e33ee7e9eec38</td>\n",
        "      <td> False</td>\n",
        "      <td>             https://github.com/wu</td>\n",
        "      <td>   52700</td>\n",
        "      <td>               None</td>\n",
        "      <td>             wu</td>\n",
        "      <td>           Alex White</td>\n",
        "      <td>   0</td>\n",
        "      <td>  15</td>\n",
        "      <td> User</td>\n",
        "      <td>             https://api.github.com/users/wu</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "                                           avatar_url  \\\n",
        "0   https://secure.gravatar.com/avatar/a7e55f31bb4...   \n",
        "1   https://secure.gravatar.com/avatar/eb8139078bc...   \n",
        "2   https://secure.gravatar.com/avatar/13c7b665e0c...   \n",
        "3   https://secure.gravatar.com/avatar/b7937805411...   \n",
        "4   https://secure.gravatar.com/avatar/89e109fca84...   \n",
        "5   https://secure.gravatar.com/avatar/7490b4e3e9c...   \n",
        "6   https://secure.gravatar.com/avatar/dc078ac4dbd...   \n",
        "7   https://secure.gravatar.com/avatar/fb844ffed6c...   \n",
        "8   https://secure.gravatar.com/avatar/3843ec7861e...   \n",
        "9   https://secure.gravatar.com/avatar/f611628c558...   \n",
        "10  https://secure.gravatar.com/avatar/c57483c5cfe...   \n",
        "\n",
        "                                                  bio  \\\n",
        "0                                                None   \n",
        "1                                                None   \n",
        "2                                                None   \n",
        "3                                                       \n",
        "4                                                       \n",
        "5   Perl, C, C++, JavaScript, PHP, Haskell, Ruby, ...   \n",
        "6                                                None   \n",
        "7   Git Ninja and language-agnostic problem solver...   \n",
        "8                                                       \n",
        "9                                                None   \n",
        "10                                               None   \n",
        "\n",
        "                                            blog               company  \\\n",
        "0                                           None                  None   \n",
        "1                                           None                  None   \n",
        "2                                                                        \n",
        "3                                           None          Appcelerator   \n",
        "4    http://blogs.perl.org/users/steven_haryanto                     -   \n",
        "5                                  http://c9s.me                         \n",
        "6                              azhari.harahap.us          CapungRiders   \n",
        "7                             http://dukeleto.pl         Leto Labs LLC   \n",
        "8                         http://alanhaggai.org/       Thought Ripples   \n",
        "9                                 arisdottle.net  Team Rooster Pirates   \n",
        "10  http://www.geekfarm.org/wu/muse/WebHome.html                  None   \n",
        "\n",
        "              created_at                     email  followers  following  \\\n",
        "0   2012-05-04T13:59:54Z                      None          0          0   \n",
        "1   2012-05-03T18:47:13Z                      None          0          0   \n",
        "2   2010-04-07T12:15:00Z      vad.viktor@gmail.com          2          3   \n",
        "3   2012-04-02T16:13:58Z     yjin@appcelerator.com          0          0   \n",
        "4   2010-02-26T01:28:09Z  stevenharyanto@gmail.com         39        307   \n",
        "5   2009-02-01T15:20:08Z  cornelius.howl@gmail.com        330        599   \n",
        "6   2010-10-31T05:53:40Z         azhari@harahap.us         26         11   \n",
        "7   2008-10-22T03:02:15Z         jonathan@leto.net        175        635   \n",
        "8   2009-01-13T16:25:15Z           haggai@cpan.org         46        365   \n",
        "9   2009-05-12T19:29:09Z  amiri@roosterpirates.com         16         87   \n",
        "10  2009-02-08T03:28:54Z        git-c@geekfarm.org         16         87   \n",
        "\n",
        "                         gravatar_id hireable  \\\n",
        "0   a7e55f31bb45321f30211e901cd89ffa     None   \n",
        "1   eb8139078bc623dee103ed3917c080dc     None   \n",
        "2   13c7b665e0cbd94e0155387c35957d13    False   \n",
        "3   b7937805411d278ceb839175e251e2a0    False   \n",
        "4   89e109fca8474e5636c9feef7a8422ea    False   \n",
        "5   7490b4e3e9cb85a1f7dc0c8ea01a86e5     True   \n",
        "6   dc078ac4dbdc06d3e3c0ec0b6801b53d    False   \n",
        "7   fb844ffed6c5a2e69638627e3b721308     True   \n",
        "8   3843ec7861e271e803ea076035d683dd    False   \n",
        "9   f611628c5588f7a0a72c65ec1f94dfb8    False   \n",
        "10  c57483c5cfe159b98a6e33ee7e9eec38    False   \n",
        "\n",
        "                             html_url       id            location  \\\n",
        "0   https://github.com/Michaelwussler  1706010                None   \n",
        "1          https://github.com/praiser  1703505                None   \n",
        "2        https://github.com/vadviktor   238703            Budapest   \n",
        "3            https://github.com/ypjin  1598831             Beijing   \n",
        "4        https://github.com/sharyanto   211084  Jakarta, Indonesia   \n",
        "5              https://github.com/c9s    50894      Taipei, Taiwan   \n",
        "6        https://github.com/back2arie   461397           Indonesia   \n",
        "7             https://github.com/leto    30298        Portland, OR   \n",
        "8       https://github.com/alanhaggai    46288                  IN   \n",
        "9            https://github.com/amiri    83806     Los Angeles, CA   \n",
        "10              https://github.com/wu    52700                None   \n",
        "\n",
        "             login                  name  public_gists  public_repos  type  \\\n",
        "0   Michaelwussler                  None             0             3  User   \n",
        "1          praiser                  None             0             3  User   \n",
        "2        vadviktor            Vad Viktor             0            10  User   \n",
        "3            ypjin                Yuping             0             5  User   \n",
        "4        sharyanto       Steven Haryanto             5           195  User   \n",
        "5              c9s             Yo-An Lin           281           206  User   \n",
        "6        back2arie        Azhari Harahap             1            15  User   \n",
        "7             leto  Jonathan \"Duke\" Leto           276           112  User   \n",
        "8       alanhaggai     Alan Haggai Alavi             4            54  User   \n",
        "9            amiri       Amiri Barksdale            16            18  User   \n",
        "10              wu            Alex White             0            15  User   \n",
        "\n",
        "                                            url  \n",
        "0   https://api.github.com/users/Michaelwussler  \n",
        "1          https://api.github.com/users/praiser  \n",
        "2        https://api.github.com/users/vadviktor  \n",
        "3            https://api.github.com/users/ypjin  \n",
        "4        https://api.github.com/users/sharyanto  \n",
        "5              https://api.github.com/users/c9s  \n",
        "6        https://api.github.com/users/back2arie  \n",
        "7             https://api.github.com/users/leto  \n",
        "8       https://api.github.com/users/alanhaggai  \n",
        "9            https://api.github.com/users/amiri  \n",
        "..."
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Handle NumPy-like computations\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import h5py\n",
      "f = h5py.File('/home/mrocklin/Downloads/OMI-Aura_L2-OMAERO_2014m1105t2304-o54838_v003-2014m1106t215558.he5')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = Data(f)\n",
      "x.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 30,
       "text": [
        "dshape(\"\"\"{\n",
        "  HDFEOS: {\n",
        "    ADDITIONAL: {FILE_ATTRIBUTES: {}},\n",
        "    SWATHS: {\n",
        "      ColumnAmountAerosol: {\n",
        "        Data Fields: {\n",
        "          AerosolIndexUV: 1643 * 60 * int16,\n",
        "          AerosolIndexVIS: 1643 * 60 * int16,\n",
        "          AerosolModelMW: 1643 * 60 * uint16,\n",
        "          AerosolModelsPassedThreshold: 1643 * 60 * 10 * uint16,\n",
        "          AerosolOpticalThicknessMW: 1643 * 60 * 14 * int16,\n",
        "          AerosolOpticalThicknessMWPrecision: 1643 * 60 * int16,\n",
        "          AerosolOpticalThicknessNUV: 1643 * 60 * 2 * int16,\n",
        "          AerosolOpticalThicknessPassedThreshold: 1643 * 60 * 10 * 9 * int16,\n",
        "          AerosolOpticalThicknessPassedThresholdMean: 1643 * 60 * 9 * int16,\n",
        "          AerosolOpticalThicknessPassedThresholdStd: 1643 * 60 * 9 * int16,\n",
        "          CloudFlags: 1643 * 60 * uint8,\n",
        "          CloudPressure: 1643 * 60 * int16,\n",
        "          EffectiveCloudFraction: 1643 * 60 * int8,\n",
        "          InstrumentConfigurationId: 1643 * uint8,\n",
        "          MeasurementQualityFlags: 1643 * uint8,\n",
        "          NumberOfModelsPassedThreshold: 1643 * 60 * uint8,\n",
        "          ProcessingQualityFlagsMW: 1643 * 60 * uint16,\n",
        "          ProcessingQualityFlagsNUV: 1643 * 60 * uint16,\n",
        "          RootMeanSquareErrorOfFitPassedThreshold: 1643 * 60 * 10 * int16,\n",
        "          SingleScatteringAlbedoMW: 1643 * 60 * 14 * int16,\n",
        "          SingleScatteringAlbedoMWPrecision: 1643 * 60 * int16,\n",
        "          SingleScatteringAlbedoNUV: 1643 * 60 * 2 * int16,\n",
        "          SingleScatteringAlbedoPassedThreshold: 1643 * 60 * 10 * 9 * int16,\n",
        "          SingleScatteringAlbedoPassedThresholdMean: 1643 * 60 * 9 * int16,\n",
        "          SingleScatteringAlbedoPassedThresholdStd: 1643 * 60 * 9 * int16,\n",
        "          SmallPixelRadiancePointerUV: 1643 * 2 * int16,\n",
        "          SmallPixelRadiancePointerVIS: 1643 * 2 * int16,\n",
        "          SmallPixelRadianceUV: 6783 * 60 * float32,\n",
        "          SmallPixelRadianceVIS: 6786 * 60 * float32,\n",
        "          SmallPixelWavelengthUV: 6783 * 60 * uint16,\n",
        "          SmallPixelWavelengthVIS: 6786 * 60 * uint16,\n",
        "          TerrainPressure: 1643 * 60 * int16,\n",
        "          TerrainReflectivity: 1643 * 60 * 9 * int16,\n",
        "          XTrackQualityFlags: 1643 * 60 * uint8\n",
        "          },\n",
        "        Geolocation Fields: {\n",
        "          GroundPixelQualityFlags: 1643 * 60 * uint16,\n",
        "          Latitude: 1643 * 60 * float32,\n",
        "          Longitude: 1643 * 60 * float32,\n",
        "          OrbitPhase: 1643 * float32,\n",
        "          SolarAzimuthAngle: 1643 * 60 * float32,\n",
        "          SolarZenithAngle: 1643 * 60 * float32,\n",
        "          SpacecraftAltitude: 1643 * float32,\n",
        "          SpacecraftLatitude: 1643 * float32,\n",
        "          SpacecraftLongitude: 1643 * float32,\n",
        "          TerrainHeight: 1643 * 60 * int16,\n",
        "          Time: 1643 * float64,\n",
        "          ViewingAzimuthAngle: 1643 * 60 * float32,\n",
        "          ViewingZenithAngle: 1643 * 60 * float32\n",
        "          }\n",
        "        }\n",
        "      }\n",
        "    },\n",
        "  HDFEOS INFORMATION: {\n",
        "    ArchiveMetadata.0: string[65535, 'A'],\n",
        "    CoreMetadata.0: string[65535, 'A'],\n",
        "    StructMetadata.0: string[32000, 'A']\n",
        "    }\n",
        "  }\"\"\")"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "array([[-32767, -32767, -32767, ..., -32767, -32767, -32767],<br>       [-32767, -32767, -32767, ..., -32767, -32767, -32767],<br>       [-32767, -32767, -32767, ..., -32767, -32767, -32767],<br>       ..., <br>       [-32767, -32767, -32767, ..., -32767, -32767, -32767],<br>       [-32767, -32767, -32767, ..., -32767, -32767, -32767],<br>       [-32767, -32767, -32767, ..., -32767, -32767, -32767]], dtype=int16)"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 31,
       "text": [
        "array([[-32767, -32767, -32767, ..., -32767, -32767, -32767],\n",
        "       [-32767, -32767, -32767, ..., -32767, -32767, -32767],\n",
        "       [-32767, -32767, -32767, ..., -32767, -32767, -32767],\n",
        "       ..., \n",
        "       [-32767, -32767, -32767, ..., -32767, -32767, -32767],\n",
        "       [-32767, -32767, -32767, ..., -32767, -32767, -32767],\n",
        "       [-32767, -32767, -32767, ..., -32767, -32767, -32767]], dtype=int16)"
       ]
      }
     ],
     "prompt_number": 31
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x.HDFEOS.SWATHS.ColumnAmountAerosol.Data_Fields.CloudPressure.max()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "1013"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 32,
       "text": [
        "1013"
       ]
      }
     ],
     "prompt_number": 32
    }
   ],
   "metadata": {}
  }
 ]
}